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The determination of relatedness between individuals in a 
family is crucial in analysis of common complex diseases. We 
present a method to infer close inter-familial relationships 
based on SNP genotyping data and provide the relationship 
coefficient of kinship in Korean families. We obtained blood 
samples from 43 Korean individuals in two families. SNP data 
was obtained using the Affymetrix Genome-wide Human SNP 
array 6.0 and the lllumina Human 1M-Duo chip. To measure 
the kinship coefficient with the SNP genotyping data, we 
considered all possible pairs of individuals in each family. The 
genetic distance between two individuals in a pair was 
determined using the allele sharing distance method. The 
results show that genetic distance is proportional to the kinship 
coefficient and that a close degree of kinship can be confirmed 
with SNP genotyping data. This study represents the first 
attempt to identify the genetic distance between very closely 
related individuals. [BMB Reports 2013; 46(6): 305-309] 



INTRODUCTION 

Human genetic variations, referring to all of the genetic char- 
acteristics observed within the human genome, are responsible 
for human diversities (1). Among these genetic variations, sin- 
gle nucleotide polymorphisms (SNPs) are the most common 
genetic variations between individuals. SNPs are present at a 
frequency of approximately 1 in every 1,000 bases in the hu- 
man genome (2). Variations in the human genome can affect 
how humans develop diseases and respond to pathogens and 
drugs. SNPs are also thought to be a key to realizing the con- 
cept of personalized medicine (3). 

Current genotyping technologies allow the analysis of a set 
of a million SNPs spread across the whole genome for a few 
hundred dollars per person (4). Thus, large-scale studies in- 
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volving hundreds of thousands of SNPs and thousands of in- 
dividuals are feasible. Genome-wide association studies 
(GWAS) have been widely used to identify common variants 
that contribute to variation in complex human phenotypes and 
diseases (5). 

Pedigree integrity is important in population-based data with 
unknown family structures (6). Furthermore, family-based link- 
age analysis has been tremendously successful in identifying 
genes underlying diseases with Mendelian inheritance patterns 
(7). The high-throughput genotyping presents opportunities for 
pedigree error detection using millions of SNPs and for the 
identification of the degree of relatedness between a pair of 
individuals. Knowing the relationships between family mem- 
bers, as in a pedigree, can help produce a more accurate esti- 
mate of each individual's haplotypes, i.e., sequences of alleles 
on a chromosome (8). 

The determination of relatedness between individuals in a 
family pedigree is particularly important for analysis of com- 
mon complex diseases (9). Inferring the distance of the blood 
relationship between close or distant relatives is the first key 
step in various disease-gene association studies. Here, we de- 
fined 'kinship coefficient' as the level of the relationship be- 
tween two persons related by blood, such as parent to child, 
one sibling to another, grandparent to grandchild or uncle to 
nephew, first cousins, etc. Previous studies have been mainly 
focused on the analysis of inter-generational relationships and 
pedigree comparisons (8, 10-12). However, few attempts have 
been made to determine the genetic distances of very closely 
related individuals based on SNP genotyping data. 

In this study, we present a method to infer close inter-fami- 
lial relationships from SNP genotyping data and ranking the 
kinship with a kinship coefficient of l st -6 th degree in Korean 
families. The kinship coefficient can be used to verify relation- 
ships, reconstruct pedigrees, detect pedigree errors, analyze 
forensic DNA data, and to identify unknown relationships 
among family members. In this article, we calculated allele 
sharing distances (ASD) (13) from SNP genotyping data from 
43 individuals of two different Korean families, measured the 
uncertainty from the calculated ASD scores and translated 
these scores into kinship coefficients to identify the degree of 
kinship among Koreans. The kinship coefficient can be used to 
verify relationships, to reconstruct pedigrees, to detect pedi- 
gree errors, to analyze forensic DNA data, and to identify un- 
known relationships between family members. 
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RESULTS AND DISCUSSION 

We evaluated SNP markers located in four genomic regions, 
CoRS, and ADME functions from the two microarray platforms 
in 43 individuals from two Korean families (Table 1). We con- 
sidered 298 pairs (42 pairs in C family and 256 pairs in 
K_family) of individuals from the same family and classified 
their degree of kinship from 1 st to 6 th degree relatives (Table 2). 
The ASD values of the pairs were calculated using equation 
(eq. 1). We constructed an automatic pipeline for identifying 
ASDs from SNP genotyping data. 

We obtained the genetic distance scores for 1 M -6 th degree 
relatives based on data from the Affymetrix and lllumina plat- 
forms, respectively. In the two platforms, we found that genet- 
ic distance is proportional to degree of kinship. Fig. 1 shows 
the genetic distance scores for 1 st -6 th degree relatives based on 
data from the Affymetrix platform. It also shows that genetic 
distance can be measured for all degrees of kinship, but the 
difference in the genetic distance becomes smaller as the de- 
gree of kinship is higher. These results also show that a genetic 
distance between 2 nd degree relatives, such as sibling/sibling, 
is closer than 1 st degree relatives, such as parent-child. 
Theoretically, the parent/child shares and sibling/sibling both 
share 50% of their genetic material. However, the degree of 
genetic relatedness for siblings is not necessarily 50%, unlike 
the absolute 50% ratio between parents and children. The ac- 
tual ratio between siblings can vary from 100% at one extreme 
(identical twins) to an exceedingly unlikely 0%. We used two 
sample t-test to test whether the genetic distances between 1 st 
and 2 nd degree relatives are statistically different. We found 
that their values were significantly different (Table 3). If both 
father and mother are from a genetically isolated community, 
their child will likely have many more similar genes. In that 
case, siblings will have more than 50% similarity, but the sim- 
ilarity depends how similar their mother and father are. 

Genetic distances were calculated for non-blood related in- 
dividuals to validate the distance of degree of kinship. We 
measured the genetic distance between C_family members 
and K family members, who do not share common ancestors 
in their genealogical histories. We found that the distance be- 
tween non-blood related individuals was higher than that for 
6 th degree relatives. 

We assigned the genetic distance values in this study as a 
standard of the 1 st -6* degree and officially registered these dis- 
tance values as reference standards for 1 M -6 th degrees of 

Table 1. Selected SNP markers from Affymetrix and lllumina platforms 



Korean kinship in the National Center for Standard References 
Data (NCSRD). Using this data, a degree of kinship between 
two unknown individuals in a Korean family can be inferred or 
corrected by simply comparing the genetic distance values of 
registered reference standards. 

We have constructed a FTP site (ftp://ftp.kobic.re.kr/pub/ge- 
nomesrd) for these genomic reference standards. The users can 
download the SNP genotyping data for the 43 analyzed 
Korean individuals and the genetic distance scores for the 
1 s, -6 ,h degrees of Korean kinship. The user can also download 
the genetic distance values from the homepage of the NCSRD 
(http://www.srd.re.kr), written in Korean. 

We have proposed a method to infer close relationships be- 
tween two individuals using high-density genotyping data and 
developed the reference standards of degree of kinship in 
Korean families. Our approach is the first attempt to calculate 
the genetic distances of very closely (blood) related individuals. 
Our approach, based on the allele sharing between any pair of 
individuals, was sufficient to classify relative pairs as parent-off- 
spring pairs, twins, full siblings, or 2 nd , 3 rd , 4 th , 5 th , or 6 th degree 
relatives. 

Our method can be performed rapidly for a single pedigree 
or pair of individuals, and will be useful for a wide range of 
applications, including forensic DNA analysis (assuming that 
current forensics technology transitions to high-density SNP 
genotyping) and relative testing and correcting. Knowing the 
exact degree of relatedness is also important in determining 
the relationships between long-separated family and when per- 
forming organ transplantation. Our results will be further ap- 
plied toward automated pedigree reconstruction and associa- 
tion mapping in the absence of a pre-specified pedigree or in 
the presence of unknown genletic relatedness in the sample. 
However, our study has an important limitation in that the ref- 



Table 2. The number of relationships in the C family and K family 



Degree of relationship 


CJamily 


Kfamily 


Total 


1 


12 


32 


44 


2 


11 


26 


37 


3 


7 


53 


60 


4 


12 


65 


77 


5 


0 


60 


60 


6 


0 


20 


20 


Total 


42 


256 


298 



Platform CDS UTR Intron Intergenic ADME a CoRS b 



Affymetrix 8,611 7,348 327,494 559,793 13,648 306,149 

lllumina 33,437 24,969 430,956 646,093 12,404 306,149 



"absorption, distribution, metabolism, and excretion, which describe the disposition of a pharmaceutical compound within an organism. b Common 
SNP rs unmber present in both the Affymetrix and lllumina microarray chips. 
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Fig. 1. Genetic distances of 1 st -6 th (or 
4 ) degree relatives and unrelated in- 
dividuals calculated using the ASD al- 
gorithm in Affymetrix platform: (A) 
Choi family, (B) Kang family. The 9 th 
degree relative in X axis of Kang fam- 
ily represents unrelated individuals. 
CDS and UTR represent 'coding se- 
quence' and 'untranslated region', re- 
spectively. CoRS means 'common SNP 
rs unmber', which represents in both 
the Affymetrix and lllumina microarray 
chips. ADME is an acronym in phar- 
macokinetics and pharmacology for ab- 
sorption, distribution, metabolism, and 
excretion, and describes the disposition 
of a pharmaceutical compound within 
an organism. 
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Table 3. P values 


of two sample t-test between 1 


st i r^nd i 

and 2 dej 


;ree of kinships in 


Affymetrix and lllumina 


platforms 




Platform 


CDS 


UTR 


Intron 


Intergenic 


ADME 


CoRS 


Affymetrix 


2.33E-09 


6.11E-08 


2.64E-10 


5.45E-09 


5.68E-10 


1.81E-04 


lllumina 


3.65E-10 


1 .26E-09 


1.87E-11 


1.28E-10 


3.57E-11 


9.42E-04 



erence standard in the results is valid only for Korean families. 

MATERIALS AND METHODS 

DNA extraction and genotyping 

Blood samples were obtained from 43 Korean individuals from 
two families, the Choi (C_family) and Kang (K_family) families. 



The C_family and K family consisted of 19 and 24 individuals, 
respectively, whose relatedness was known to 6 degrees. 
Genomic DNA (gDNA) was extracted from peripheral blood 
leukocytes using the QIAamp DNA Stool Kit (Qiagen) accord- 
ing to the manufacturer's instructions. This study was ap- 
proved by the Institutional Review Board (IRB) of the Faculty 
of Medicine at The Catholic University of Korea. Informed 
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consent was obtained from all participants. 

SNP data was obtained using the Affymetrix Genome-wide 
Human SNP array 6.0 and lllumina Human 1M-Duo chips as 
recommended by the manufacturers. We obtained 934,969 
(Affymetrix) and 1,199,187 (lllumina) SNP calls with a sample 
call rate of >99%, reproducibility of >99.9%, and Mendelian 
inconsistence of ^0.1%. We filtered SNP markers with a mi- 
nor allele frequency >0.05% and Hardy- Weinberg equili- 
brium (HWE) P <10" b . The filtered SNPs were used for sub- 
sequent genetic distance analysis. 

Selection of SNP markers 

From the filtered Affymetrix and lllumina SNP markers, we se- 
lected SNP loci using three different criteria. First, we selected 
SNP markers that were located in four chromosomal regions 
(CDS, UTR, intron, and intergenic) based on a SNP annotation 
file (version 128) from the UCSC genome browser (14) and an- 
notation data from Affymetrix and lllumina. Second, we ex- 
tracted common SNP rs numbers (CoRS), existing both 
Affymetrix and lllumina microarray chip. Finally, we obtained 
information about the effects of these SNP markers on ADME 
(15). ADME describes the disposition of a pharmaceutical 
compound within an organism and influence the drug levels 
and kinetics of drug exposure to the tissues. We downloaded 
protein information of ADME functions from the Pharmainfor- 
matics database (http://bidd.nus.edu.sg/group/admeap/admeap. 
asp) and obtained SNP markers with effects on ADME by map- 
ping into dbSNP (16) loci using in-house Python scripts. 

Allele shared distance (ASD) 

To measure kinship coefficients using SNP genotyping data, 
we considered all possible pairs of individuals in each family; 
married couples were excluded because they are genetically 
un-related to each other. We classified these pairs as 1 st to 6 th 
relatives according to their degree of kinship. Then, the genet- 
ic distance (D) between the two individuals in each pair was 
calculated. The distance between individual / and / was de- 
fined as 

D^^ESy (eq.1) 

where 5;/ is the number of alleles shared between individuals / 
and /', U is the number of total loci evaluated, and k is degree 
of kinship. Dk means a genetic distance of k" 1 degree of kinship 
in a family. From this distance, we obtained ASD values for 
the kth degree of kinship using a simple function, defined as 
ASDk = 1-Dk. If two individuals have the same alleles at nearly 
all loci, the ASD value will be close to 0; if individuals have 
no alleles in common, the ASD will be close to 1. We calcu- 
lated the ASD values for all the degrees of kinship from the 
two families. 



Measurement of uncertainty of genetic distance 

To obtain a confidence interval (CI) of ASD for each degree of 
kinship, we calculated the sample mean and standard error of 
the mean of the k th degree. We calculated a CI with a 95% 
confidence level. These confidence intervals are defined as the 
uncertainty of the measurement, the dispersion of the values 
that could reasonably be attributed to the measured variable. 
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